Skip to content

NdArray Initializers API #14

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

karllessard
Copy link
Collaborator

This draft shows how we could hydrate a just-allocated array using a new API that focuses on a stateful builder approach rather than the positional endpoints setXXX(value, coords) exposed by the NdArray itself.

Not only it simplifies the task of initializing an array with explicit values but also it allows us to allocate an empty sparse array and then let the user initialize its data (right now, the user have to allocate and initialize the data of the indices and values subarrays first before being able to allocate a sparse array).

Example of usage:

NdArray<String> array = NdArrays.ofObjects(String.class, Shape.of(3, 2), initializer -> {
     initializer.byVectors()
         .put("Cat", "Dog")
         .put("House") // partial initialization
         .to(2)
         .put("Orange", "Apple");
});
// -> [["Cat", "Dog"], ["House", null], ["Orange", "Apple"]]

Note: this is a newer version of #4

@karllessard
Copy link
Collaborator Author

@Craigacp : This is still in draft mode as it is missing most of the datatypes, right now only arrays of Double and Objects (like String) are being demonstrated. Once we are satisfied, I'll update it for all other missing types and turn it to a real PR.

I couldn't unfortunately reopen the previous version of the PR as I forced-push it while it was closed and GitHub does not like that. Still, I replied and addressed your comments you put in there and we can continue the discussion here for the few remaining points.

@@ -568,6 +591,23 @@ public static DoubleNdArray wrap(Shape shape, DoubleDataBuffer buffer) {
return DoubleDenseNdArray.create(buffer, shape);
}

/**
* Creates an Sparse array of doubles of the given shape, hydrating it with data after its allocation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This says "hydrating" and the one above says "initializing". Similarly for the rest of the javadoc in this class, and also the parameter names.

@@ -31,6 +31,20 @@
@SuppressWarnings("unchecked")
public abstract class AbstractDenseNdArray<T, U extends NdArray<T>> extends AbstractNdArray<T, U> {

abstract public DataBuffer<T> buffer();

public NdArraySequence<U> elementsAt(long[] startCoords) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantics of this method are still confusing to me. It doesn't check that startCoords is in bounds and the way in which it interacts with the starting position is odd (as I don't see why iteration from a particular starting co-ordinate should have fewer dimensions than the thing you're iterating).

@@ -189,6 +189,15 @@ public long positionOf(long[] coords) {
return position;
}

public boolean increment(long[] coords) {
for (int i = coords.length - 1; i >= 0; --i) {
if ((coords[i] = (coords[i] + 1) % shape.get(i)) > 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this mutates the state of the input array, and returns true if it could increment it and remain in bounds?

* Reset the position of the initializer in the <code>NdArray</code> so that the next vectors provided are written
* starting from the given {@code coordinates}.
*
* <p>Note that it is not possible to move backward within the array, {@code coordinates} must be equal or greater
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it throw IllegalArgumentException if the co-ordinates are less than the current position? Same question for the other to methods.

/**
* Per-scalar initialization of an {@link NdArray}.
*
* <p>Scalar initialization writes sequentially to an <code>NdArray</code> each individual values provided. Position
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/values/value/

*
* @see BaseNdArrayInitializer
*/
public interface DoubleNdArrayInitializer extends BaseNdArrayInitializer<Double> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should define the behaviour of skipping forward on a primitive array. I would assume it will leave the unspecified values as zero, but it should be documented. I think that might mean adding docs to the to method in each of the inner interfaces.

@@ -0,0 +1,68 @@
package org.tensorflow.ndarray.impl.initializer;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

License & copyright header.


public abstract class AbstractNdArrayInitializer<V extends AbstractNdArray<?, ?>> {

protected static long[] validateNewCoords(long[] actualCoords, long[] newCoords) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this return the argument?


protected long[] validateRankCoords(int elementRank, long[] coords) {
DimensionalSpace dimensions = array.dimensions();
int dimensionIdx = dimensions.numDimensions() - elementRank - 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My personal preference on chains of subtractions or divisions is to add parens to make the order of operations explicit, but I won't hold you to it.

if (coords.length != newCoordinates.length) {
throw new IllegalArgumentException("New coordinates are not for the initialized dimension");
}
resetTo(Arrays.copyOf(newCoordinates, newCoordinates.length));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a bunch of copying in this class and I'm not sure why it's all necessary. Can we avoid it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants